Forbidden Extension Queries
نویسندگان
چکیده
Document retrieval is one of the most fundamental problem in information retrieval. The objective is to retrieve all documents from a document collection that are relevant to an input pattern. Several variations of this problem such as ranked document retrieval, document listing with two patterns and forbidden patterns have been studied. We introduce the problem of document retrieval with forbidden extensions. Let D = {T1,T2, . . . ,TD} be a collection of D string documents of n characters in total, and P+ and P− be two query patterns, where P+ is a proper prefix of P−. We call P− as the forbidden extension of the included pattern P+. A forbidden extension query 〈P+, P−〉 asks to report all occ documents in D that contains P+ as a substring, but does not contain P− as one. A top-k forbidden extension query 〈P+, P−, k〉 asks to report those k documents among the occ documents that are most relevant to P+. We present a linear index (in words) with an O(|P−|+occ) query time for the document listing problem. For the top-k version of the problem, we achieve the following results, when the relevance of a document is based on PageRank: an O(n) space (in words) index with O(|P−| log σ + k) query time, where σ is the size of the alphabet from which characters in D are chosen. For constant alphabets, this yields an optimal query time of O(|P−|+ k). for any constant > 0, a |CSA|+ |CSA∗|+D log n D +O(n) bits index with O(search(P ) + k · tSA · log2+ n) query time, where search(P ) is the time to find the suffix range of a pattern P , tSA is the time to find suffix (or inverse suffix) array value, and |CSA∗| denotes the maximum of the space needed to store the compressed suffix array CSA of the concatenated text of all documents, or the total space needed to store the individual CSA of each document. 1998 ACM Subject Classification F.2.2 Pattern Matching
منابع مشابه
Forbidden Patterns
We consider the problem of indexing a collection of documents (a.k.a. strings) of total length n such that the following kind of queries are supported: given two patterns P and P−, list all nmatch documents containing P but not P−. This is a natural extension of the classic problem of document listing as considered by Muthukrishnan [SODA’02], where only the positive pattern P is given. Our main...
متن کاملApply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML
As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...
متن کاملApply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML
As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...
متن کاملPairs of forbidden class of subgraphs concerning K1, 3 and P6 to have a cycle containing specified vertices
In [3], Faudree and Gould showed that if a 2-connected graph contains no K1,3 and P6 as an induced subgraph, then the graph is hamiltonian. In this paper, we consider the extension of this result to cycles passing through specified vertices. We define the families of graphs which are extension of the forbidden pair K1,3 and P6, and prove that the forbidden families implies the existence of cycl...
متن کاملExtending the Qualitative Trajectory Calculus Based on the Concept of Accessibility of Moving Objects in the Paths
Qualitative spatial representation and reasoning are among the important capabilities in intelligent geospatial information system development. Although a large contribution to the study of moving objects has been attributed to the quantitative use and analysis of data, such calculations are ineffective when there is little inaccurate data on position and geometry or when explicitly explaining ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015